Retro: Targeted Resource Management in Multi-tenant Distributed Systems pdfauthor=Jonathan Mace, Peter Bodik, Rodrigo Fonseca, Madanlal Musuvathi
نویسندگان
چکیده
In distributed systems shared by multiple tenants, effective resource management is an important pre-requisite to providing quality of service guarantees. Many systems deployed today lack performance isolation and experience contention, slowdown, and even outages caused by aggressive workloads or by improperly throttled maintenance tasks such as data replication. In this work we present Retro, a resource management framework for shared distributed systems. Retro monitors per-tenant resource usage both within and across distributed systems, and exposes this information to centralized resource management policies through a high-level API. A policy can shape the resources consumed by a tenant using Retro’s control points, which enforce sharing and ratelimiting decisions. We demonstrate Retro through three policies providing bottleneck resource fairness, dominant resource fairness, and latency guarantees to high-priority tenants, and evaluate the system across five distributed systems: HBase, Yarn, MapReduce, HDFS, and Zookeeper. Our evaluation shows that Retro has low overhead, and achieves the policies’ goals, accurately detecting contended resources, throttling tenants responsible for slowdown and overload, and fairly distributing the remaining cluster capacity.
منابع مشابه
Towards General-Purpose Resource Management in Shared Cloud Services pdfauthor=Jonathan Mace, Peter Bodik, Madan Musuvathi, Rodrigo Fonseca
In distributed services shared by multiple tenants, managing resource allocation is an important pre-requisite to providing dependability and quality of service guarantees. Many systems deployed today experience contention, slowdown, and even system outages due to aggressive tenants and a lack of resource management. Improperly throttled background tasks, such as data replication, can overwhelm...
متن کاملRetro: Targeted Resource Management in Multi-tenant Distributed Systems
In distributed systems shared by multiple tenants, effective resource management is an important pre-requisite to providing quality of service guarantees. Many systems deployed today lack performance isolation and experience contention, slowdown, and even outages caused by aggressive workloads or by improperly throttled maintenance tasks such as data replication. In this work we present Retro, ...
متن کاملPivot Tracing: Dynamic Causal Monitoring for Distributed Systems pdfauthor=Jonathan Mace, Ryan Roelke, Rodrigo Fonseca
Monitoring and troubleshooting distributed systems is notoriously diõcult; potential problems are complex, varied, and unpredictable. _emonitoring and diagnosis tools commonly used today – logs, counters, andmetrics – have two important limitations: what gets recorded is deûned a priori, and the information is recorded in a componentor machine-centric way,making it extremely hard to correlate e...
متن کاملPrincipled work÷ow-centric tracing of distributed systems
Workow-centric tracing captures the workow of causallyrelated events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be desi...
متن کاملSimplified Semantics and Debugging of Concurrent Programs via Targeted Race Detection
of the Dissertation Simplified Semantics and Debugging of Concurrent Programs via Targeted Race Detection
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015